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(57) Abstract: Detecting harmful or illegal intrusions into a 
computer network or into restricted portions of a computer net- 
work uses a features generator or builder to generate a feature 
reflecting changes in user and user group behavior over lime. 
User and user group historical means and standard deviations 
are used to generate a feature mat is not dependent on rigid or 
static rule sets. These statistical and historical values are calcu- 
lated by accessing user activity data listing activities performed 
by users on the computer system. Historical information is then 
calculated based on die activities performed by users on the 
computer system. The feature is calculated using the histori- 
cal information based on the user or group of users activities. 
The feature is then utilized by a model to obtain a value or score 
which indicates the likelihood of an intrusion into the computer 
network. The historical values are adjusted according to shifts 
in normal behavior of users of the computer system. This al- 
lows for calculation of the feature to reflect changing charac- 
teristics of the users on the computer system. 
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Features Generation for use in Computer Network 

Intrusion Detection 

5 Background of the Invention 

1. Field of the Invention 

The present invention relates generally to the field of computer systems 
software and computer network security. More specifically, it relates to software for 
examining user and group activity in a computer network for detecting intrusions and 
10 security violations in the network. 

2. Discussion of Related Art 

Computer network security is an important issue for all types of organizations 
and enterprises. Computer break-ins and their misuse have become common features. 

15 The number, as well as sophistication, of attacks on computer systems is on the rise. 
Often, network intruders have easily overcome the password authentication 
mechanism designed to protect the system. With an increased understanding of how 
systems work, intruders have become skilled at determining their weaknesses and 
exploiting them to obtain unauthorized privileges. Intruders also use patterns of 

20 intrusion that are often difficult to trace and identify. They use several levels of 

indirection before breaking into target systems and rarely indulge in sudden bursts of 
suspicious or anomalous activity. If an account on a target system is compromised, 
intruders can carefully cover their tracks as not to arouse suspicion. Furthermore, 
threats like viruses and worms do not need human supervision and are capable of 
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replicating and traveling to connected computer systems. Unleashed at one computer, 
by the time they are discovered, it is almost impossible to trace their origin or the 
extent of infection. 

As the number of users within a particular entity grows, the risks from 
5 unauthorized intrusions into computer systems or into certain sensitive components of 
a large computer system increase. In order to maintain a reliable and secure computer 
network, regardless of network size, exposure to potential network intrusions must be 
reduced as much as possible. Network intrusions can originate from legitimate users 
within an entity attempting to access secure portions of the network or can originate 
10 from illegitimate users outside an entity attempting to break into the entity's network 
often referred to as "hackers." Intrusions from either of these two groups of users can 
be damaging to an organization's computer network. Most attempted security 
violations are internal; that is, they are attempted by employees of an enterprise or 
organization. 

15 One approach to detecting computer network intrusions is calculating 

"features" based on various factors, such as command sequences, user activity, 
machine usage loads, resource violations, files accessed, data transferred, terminal 
activity, network activity, among others. Features are then used as input to a model or 
expert system which determines whether a possible intrusion or violation has 

20 occurred. The use of features is well-known in various fields in computer science 
including the field of computer network security, especially in conjunction with an 
expert system which evaluates the feature values. Features used in present computer 
security systems are generally rule-based features. Such features lead to computer 
security systems that are inflexible, highly complex, and require frequent upgrading 
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and maintenance. 

Expert systems ,hal use such features generally use .hresholds (e.g.. "if-men- 
else" elau.es, "ease" s,a,emen,s, etc, ,„ define whether .here was a vio.auon. 
Thus, a human expert with extensive knowledge of ,he computer network domain has 
5 ,o aeeura tely define and ass.gn such .hresholds for ,he system ,o be effecrivc. 
These .hresholds and o,ner rules are .ypically no, mo dif,ed often and do no, reflec, 
day-,o-day fl „c,„a,i„„s based on changing „ ser behavior . Such ^ are 
en,ered by an individual wi,h extent domain knowledge of ,he particular sys,e m . 
b, short, such systems ,ack ,he robusmess needed ,o de,ec, .nereasingly soph,s,ica,ed 
■0 hues of auack ,„ a computer system. A rehable computer system mus, be able ,o 
accura,e,y determme when a possible intension is occumng and who me in.rude, is, 
and do so by taking i„,o account trends in user acvity. 

As mentioned above, rule-based fea,ures can also be used as ,„pu, ,o a model 
ins,ead of an expert system. However, a model ,ha, can accep, only rulcbased 
« features and cannot be trained ,„ adjus, to trends and changing needs in a computer 
network generally suffers from me same drawbacks as me expert system 
configuration. A mode, is generally used in conjunction wi,h a feateres generate, and 
accepte as inpu, a features „s,. However, models presen.ly used in computer network 
intension detection systems are no, .rained to take tn.o accoun, changing requ,reme„,s 
» user .rends in a computer „e,work. Thus, such models also lead ,o computer 

security sys ,ems tha, are tnflexible, complex, and require frequen, upgrading and 
maintenance. 

FIG. ! is a block diagram depicing certain component in a security system in 
a computer nepvork as is presently known in the art. A feamres/expert systems 
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component 10 of a complete network security system (not shown) has three general 
components: user activity 12, expert system 14, and alert messages 16. User activity 
12 contains "raw" data, typically in the form of aggregated log files and is raw in that 
it is typically unmodified or has not gone through significant preprocessing. User 

5 activity 12 has records of actions taken by users on the network that the organization 
or enterprise wants to monitor. 

Expert system 14, also referred to as a "rule-based" engine, accepts input data 
from user activity files 1 2 which acts as features in present security systems. As 
mentioned above, the expert system, a term well-understood in the field of computer 
10 science, processes the input features and determines, based on its rules, whether a 
violation has occurred or whether there is anomalous activity. In two simple 
examples, expert system 14 can contain a rule instructing it to issue an alert message 
if a user attempts to logon using an incorrect password more than five consecutive 
times or if a user attempts to write to a restricted file more than once. 

15 Alert message 16 is issued if a rule threshold is exceeded to inform a network 

security analyst that a possible intrusion may be occurring. Typically, alert message 
16 contains a score and a reason for the alert, i.e., which rules or thresholds were 
violated by a user. As stated above, these thresholds can be outdated or moot if 
circumstances change in the system. For example, circumstances can change and the 

20 restricted file mentioned above can be made accessible to a larger group of users. In 
this case an expert would have to modify the rules in expert system 14. 

As mentioned above, the feature and expert system components as shown in 
FIG. 1 and conventional models used in conjunction with these components have 
significant drawbacks. One is the cumbersome and overly complex set of rules and 
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toxoids that mus, be entered to "cover" all the possible secunty violations. 
Ano,ber is the knowledge an expert mus, have in order to update or modify the rule 
base and the mode, to reflect changmg ctrenmsrances in the orgamzation. Related ,o 
■his is the dtfflcuhy in ,„ caling an expe „ „ assjs , jn ^ ^ 

5 components in the system. 

Therefore, it would be destrable to urilize a features generator fa place of a 
traditional expert system tha, can automatically update itself to reflect changes in user 
and user group current behav.or. I, would also be desirable to have such a features 
generator be self-sufficient and flexible fa tha, i, is no, dependent on changes by an 
■0 expert and is no, a rigid rule-baaed system: Tha, is, the fea.ures generaror should no, 
be dependent „„ or assume ,o have extensive system domain knowledge, ft would 
also be desirable ,o have the features genera.or use historical and other system da,a to 
modify i,se,f so ,ha, i, can take into account cun-en, user activity behavior and .rends. 
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Summary of the Invention 

To achieve the foregoing, methods, apparatus, and computer-readable medium 
are disclosed which provide computer network intrusion detection. In one aspect of 
the present invention, a method of detecting an intrusion into a computer system is 

5 described. User activity data listing activities performed by users on the computer 
system is gathered by the intrusion detection program. Historical information is then 
calculated based on the activities performed by users on the computer system. Also 
calculated is a feature using the historical information based on the user activities. 
The feature is then utilized by a model to obtain a value or score which indicates the 

10 likelihood of an intrusion into the computer network. The historical values are 

adjusted according to shifts in normal behavior of users of the computer system. This 
allows for calculation of the feature to reflect changing characteristics of the users on 
the computer system. 

In one embodiment of the present invention user log files are accessed when 

15 gathering the user activity data. In another embodiment the user activity data 

corresponds to a previously determined time period. In yet another embodiment a 
user historical mean and a user historical standard deviation is calculated for a 
particular user based on the user's activity data. In yet another embodiment a peer or 
user group historical mean and a peer historical standard deviation is calculated based 

20 on activities performed by the entire user group. In yet another embodiment a feature 
is calculated by retrieving the user historical mean and the user historical standard 
deviation. This information is then used to compute a deviation of behavior of the 
user from the user historical mean. In yet another embodiment further steps taken to 
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calcula.e a fea,ure include re,riev,„ g ,„e peer histoncal mean and [he peer ^ 

s,andard deviation and computing another deviation of behavior of the user frotn ,h= 
peer historical mean. 

In another aspec, of the present invention a method of generating a feature to 
5 be used in a model is disclosed. User-specific activity data is cofiected for a pre- 
selected number of activities. Based on the user-specific activity data, user- sp ec,fic 
histonca, data for a particular activity is generated. Peer htstonca, data value, arc 
•hen generated for ,he particular activity. The user-specific his.orical data and the 
peer histoncal data are then u.i.ized to generate a feature associated with the 
.0 particular activdy. The feature reflects cunen, and pas, behavior of a panicular user 
and of a group of users on a computer system with respect to the panicular activity. 

In one embodiment a user deviation nonnal behavior of the particular 
behavtor is calculated. m another embodiment a deviahon from peer normal activity 
by the panicular user for the activity is calculated. In ye, another embodimen, 
.5 generanng user-spec.fic h,s,orical data for a panicular acfivny involves determining 
.he number of ,imes the pan.cular ac,ivi,y was perfonned by a user during a specific 
time period. A previous user historical mean value is calculated and is associa,ed 
with the panicular activity ustng the number of times the activiry was performed. A 
curren, user htstorical mean value is calculated and a previous user histonca, standard 
» deviahon value calculate, and is associa.ed with panicular acfivhy „ sing the „ umber 
of rimes the ac,ivi,y was perfonned. This leads ,o a cunen, user historical s,andard 
deviation value. 

In another aspect of the presen, invention a compu,er network i„ trusion 
detechon system is described. The intrusion de,ec,io„ system includes a user acivi.y 
* da,a f„e that con,,„s user-specific da,a rela,ed ,o activities perfonned bv a panicular 
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user. A historical data file contains statistical and historical data related to past 
behavior of the user and of the user's peer group. A features generator or builder 
accepts as input the user-specific data and the statistical data related to past behavior 
of a user and of a peer group. This allows the features generator to calculate a feature 

5 based on current and past behavior of the user and the current and past behavior of the 
peer group. 

In one embodiment the network intrusion detection system contains a model 
trained to accept as input a feature generated by the features generator and to output a 
score indicating the likelihood that a particular activity is an intrusion. In another 

10 embodiment the user activity data file includes a user identifier, an activity 

description, and a timestamp. In yet another embodiment, the network intrusion 
detection system includes a features list logically segmented where each segment 
corresponds to a user and contains values corresponding to activities performed by the 
user. A segment in the features list has a section contains user-related values 

15 indicating the degree of normality or abnormality of the user's behavior compared to 
prior behavior. Another section in a segment contains peer-related values indicating 
the degree of normality or abnormality of the user's behavior compared to behavior of 
the user's peers. In yet another embodiment the historical data file contains a user and 
peer historical means and user and peer historical standard deviations. 



8 



BNSOOCID: <WO_0131420A2J_: 



WO 01/31420 



Brief Description of the Drawing 



PCT/USOO/29490 



The invention may be best understood by reference to the following 
description taken in conjunction with the accompanying drawings in which: 

FIG. 1 is a block diagram of a features/expert system component of a secuntv 
> system in a computer network as is presently known in the art. 

FIG. 2 is a block diagram of a computer network security system in 
accordance with the described embodiment of the present invention. 

FIG. 3 is a schematic diagram showing the formation of user activity loo files 
or the raw user data, in accordance with one embodiment of the present invention. ' 

FIG. 4 is a flow diagram of a process for generating user historical data in 
accordance with one embodiment of the present invention. 

FIG. 5 is a flow diagram of a process for generating peer historical data in 
accordance with one embodiment of the present invention. 

FIG. 6 is a flow diagram of a process for generating a features list containing 
data on a user's activity in accordance with one embodiment of the present invention. 

FIG. 7 is a flow diagram of a process for generating another portion of. 
features list related to a user's activity relative to peer activity i„ accordance with 
embodiment of the present invention. 

FIG. 8 is a schematic diagram of a features list in accordance with one 
embodiment of the present invention. 

FIG. 9 is a block diagram of a typical computer system suitable for 
implementing an embodiment of the present invention. 
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Detailed Description 

Reference will now be made in detail to a preferred embodiment of the 
invention. An example of the preferred embodiment is illustrated in the 
accompanying drawings. While the invention will be described in conjunction with a 
5 preferred embodiment, it will be understood that it is not intended to limit the 
invention to one preferred embodiment. To the contrary, it is intended to cover 
alternatives, modifications, and equivalents as may be included within the spirit and 
scope of the invention as defined by the appended claims. 

A method and system for using historical and statistical data in conjunction 

10 with current user activity data to derive features for use in a computer network 

intrusion detection program is described in the various figures. The techniques used 
in the present invention take user and peer activity data and calculate means and 
standard deviations based on the activity data which are then used to generate a 
features list. By using the historical data, the features generator can take into account 

15 changing behavior of the user and of the user's peers, and need not depend on 
extensive domain knowledge. The features list is then used as input to a model 
which, in turn, outputs a score or value indicating the level of a possible intrusion. 

FIG. 2 is a block diagram of a computer network security system 100 in 
accordance with the described embodiment of the present invention. User activity 
20 files 12 are generally the same as those shown in FIG. 1 . These files contain raw user 
data generated from various system resources and, in the described embodiment, are 
parsed and organized according to user and time of activity. They are described in 
greater detail in FIG. 3. Historical data 102 contains data relating to prior activity 

10 
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performed by a user and cumulative da,a of acuvines performed by ,he peer group 
(mcludmg ,he user) in a particular time frame. In other embodiments, smaller or 
larger groups, different from the user peer group, can be monitored, to the described 
embodiment .he peer group is all users in a particular system who have togged in for a 
5 particular time period, such as a typical work day. The generatton of user htstorica, 
data is descnbed in greater detail in FIG. 4 and the generation of user peer group 
historical data is described in greater detail in FIG. 5. 

User activity files 1 2 and historical data 102 are used as ,npu. to a feature 
generator or builder 104. In the described embodiment, feature generator 1 04 is 
> imp.emented involving an equation for calculating a time-weigh.ed mean, discussed 
in greater derail in FIGS. 6 and 7. The ou,pu, from feature generator 1 04 is a features 
hs. 106. to the described embodiment, features list 1 06 contains 47 features which 
can be classified into several dilfercn, categones such as violations, user activities, 
computer and network loads, and so on. Characterises of feature lis, ,06 are 
described in greater detail in FIG. 8. Individual features from features lis, . 06 arc 
used as inpu, ,o a mode, .08. As is well known in ,„e field of computer science, 
.here are many differen, mode, processes, such as linear regress.on, Markov models, 
graphical models, and regression models. A mode, is .rained to evaluate features ,o 
recognize the possibility of a network intrusion. By .raining model 1 08 to process 
certam ,ypes of features, i, can recognize po.en.ial intrusions. As is we,, known i„ the 
art, a mode, can accept differen, .ypes of feufures. One example of a fca.ure is user 
login failure, such as ,he ,im= betiveen , og i„ failures for a particular user. Once ,he 
model receives al, inpu, features, i, ca , cu , ales . „, , ,„ Thjs m ^ ^ ^ 
Ute inpu, features and how ,be model has been .rained, to toe described embodiment, 

11 
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the model is trained using a neural network algorithm. A score 1 10 can be 
normalized to a number between 0 and 1000, a high number indicating a stronger 
possibility of an intrusion. 

FIG. 3 is a schematic diagram showing the formation of user activity files 12, 
5 or the raw user data, in accordance with one embodiment of the present invention. As 
mentioned above, user activity files 12 contain raw data of activities performed by 
users. As described below, user activity files 12 is made up of numerous individual 
user logs, such as user log 204 in FIG. 3. In the described embodiment, the users are 
on one particular computer system, typically supported by a mainframe computer and 

10 operating system. In other embodiments, the raw data can come from several 

computer systems each supported by different computers. Similarly, score 1 10 can be 
derived from data from one or more computer systems and can measure potential 
intrusions for one or all systems. A computer system 200 is shown containing a 
number of sources from which raw user activity data is drawn. Examples of these 

15 sources or files include operating system files containing executed commands, 

operations on programs, exceptions, operations on files, and other more data-specific 
files such as badge-in data. In the described embodiment the sources are maintained 
by the Multiple Virtual Storage ("MVS") operating system of the IBM Corporation, 
and used on IBM mainframe computers. These data sources are part of the MVS 

20 operating system and are created and maintained as part of the operating system. The 
process can be used in computer systems using operating systems other than MVS 
such as a Unix-based operating system. Using the example from above, to determine 
the time between login failures, the intrusion program checks user activity files 12. 

A raw data log 202 contains user activity for all users logged in a particular 

12 
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confer sys.em such as sys.em 200. Computer system 200 parses raw da,a tog 202 
according to user and ,i me of activity .hereby creating a series of individual user togs, 
such as user log 204. In ihc described embod.men,, user tog 204 is a series of 
variable ,eng,h records coining a user name, a rimesiamp of when ,he user acriviry 
* occurred and ihe name of ft. specific user activity, as we,, as „,her infonnanon 
depending on ,he user acivi.y or command performed. After data from ,he sysrem 
resources is parsed according ,o user, user acriviry dara is reamed or k ep, in the form 
of user activity flies 12, used as input to feature generator 104. 

FIG. 4 is a flow diagram of a process for generatmg user htstorical data in 
,0 accordance with one embodiment of the present invention. In the described 

embodiment the process is performed a. the end of a user work day for each user 
logged in and for each computer system in an organizarion or enterprise. Thus, in the 
desenbed embodiment, user historical data ,s generated once a day. to other 
embodiments, histonca, data can be generated more or less fre q ue„„y depending on 
.5 characteristics of the system, number of users, and the degree of intrusion detection 
desired. Generally, each activity is examined for a pa„icu,ar user and a statisnca, 
mean, or equivaton, value, is calculated for that user for a particular day. 

At step 300 a user is selected from a corpus of users who have togged onto a 
computer system for a particular day. to the described embodtment, htstonca, dara is 
20 generated for users who have togged on and perfomreo a, ,eas, some activines dunng 
.he day. A, step 302 a particular ac.ivity is sel ec,ed from a predetennmed lis, of 
activities ma, are monitored by the intmston detection system, to the described 
embodimen,, the acivlties can be divide, into several categories such 
■ogin faitores, faitores related to accessing a flle, normal activity, resource usage, and 

13 
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others. In the described embodiment there is a predetermined set of 47 activities from 
which activities are selected. 

At step 304 the intrusion detection program determines the number of times 
the selected activity is performed on a particular day by the selected user. In the 
5 described embodiment this is determined using a counter. The total number of times 
the selected activity is performed by the user is stored as sunij. Sum* is not 
necessarily the number of times an activity is performed. It can also represent the 
total resource usage, total number of bytes transferred, among other quantities (i.e., it 
is not necessarily a counter). At step 306 sumj is used to calculate a historical mean 

10 of sumj by the user alone. In the described embodiment this is done by comparing 
sumj to a historical mean calculated for all or a predetermined number of previous 
sums. This historical mean is a time-weighted mean updated based on the new sumj. 
In addition, the previous historical mean (i.e., the historical mean from the previous 
login period) is updated to reflect the new sum;. The new user historical mean is 

15 saved in user and peer historical data file 102 as shown in FIG. 2. 

At step 308 sumj is used to update a user historical standard deviation. In the 
described embodiment, this standard deviation is calculated for the selected user for 
that particular day. As with the user historical mean, a historical standard deviation is 
calculated using sumj and is stored in user historical file 102 from where it is used as 
20 input to feature generator 104. At step 310 the intrusion detection program 

determines whether there are any remaining activities to be examined from the 
activity list. If so, control returns to step 302 where the next activity is selected and 
the process is repeated. If there are no more activities in the list, the processing for 

14 
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generating historical data for a single user for a particular day is complete. The user 
historical standard deviation and historical mean values collectively comprise the user 
historical data which is subsequently used as one input to features generator 104. 
FIG. 5 is a flow diagram of a process for generating peer historical data in 
5 accordance with one embodiment of the present invention. This process ,s different 
from that depicted in FIG. 4 in that the historical data calculated here relates to the 
entire group of users logged onto a computer system for a particular day instead of 
just one selected user. In the described embodiment, this peer group includes the 
selected user as well. The peer group (which can be viewed as a fictitious user) can 
) change frequently depending on who logs on the computer system. 

At step 502 a peer group is formed based on all the users logged on the 
computer system that day. In other embodiments, there can be more than one 
computer system from which a peer group is formed or certain users from all those 
logged on may be excluded from the peer group if needed. Once the peer group is 
formed, an activity is selected at step 504. The activities are from the same list of 
activities used in step 302 of FIG. 4, having 47 activities in the described 
embodiment. 

At step 506 another su mi is calculated based on the number of times each 
person in the peer group performed the selected activity in a particular time period. It 
is possible that some of the users in the peer group may not have performed the 
selected activity. At step 508 a peer historical mean is updated using sum, in a 
manner similar to calculating the user historical mean. In the described embodiment 
this is done by comparing sum; to a historical mean calculated for all or a 
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predetermined number of previous sums. This peer historical mean is also a time- 
weighted mean updated based on the new sumj. In addition, the previous historical 
mean (i.e., the historical mean from the previous login period) is updated to reflect the 
new sumj. At step 510 the peer historical standard deviation is calculated in a 

5 manner similar to the user historical standard deviation as described in step 308 of 
FIG. 4. The peer historical mean and standard deviation values are saved in user and 
peer historical files 102 with the user historical data. 

The peer historical standard deviation can be used to assign various 
weightings to the peer historical mean based on several criteria, such as time or other 

10 factors in the system. For example, a peer historical mean calculated four months 
prior to the present can be assigned a lighter weight than the historical mean 
calculated two days prior to the present with regard to determining the standard 
deviation. This is based on the assumption that behavior from two days ago should be 
given more importance than behavior from four months ago. In another example, a 

15 higher or lower weight can be assigned based on particular days of the weeks. 

At step 512 the intrusion detection program determines whether there are any 
other activities from the predetermined list of activities to be examined. If so, control 
returns to step 504 where another activity is selected and the process is repeated. If 
there are no more activities, the process of generating peer historical data is complete. 

20 FIG. 6 is a flow diagram of a process for generating a features list containing 

data on a user's activity in accordance with one embodiment of the present invention. 

The process of FIG. 6 depicts generation of a features list for a particular user for a 

particular time period, such as one day. The time period can be adjusted based on the 

needs of the systems and the desired accuracy of the intrusion detection program. In 
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•he described embodtment, the features lis, is a lis, of rea, numbers ranging from -5 ,„ 
5 where a iow negative number ind,ca,es behavior iess ,han normal and a positive 
number ,nd,ca,es behavior more frequent than notntal. A, step 602 an activity 
performed by a particular user is chosen from that user's activity lis, as was done in 
5 step 302 of FIG. 4. In ,he described embodtmem, a features lis,, such as features lis, 
1 06, is orgamzed firs, by user, and wtthin a user, by acivi.y. In other embod.ments 
■he features lis, can be organized differently depending on requirements of me system. 
At step 604 the fea.ures generator, such as fea.ures generator 1 04, retrieves ,he user's 
histoncal mean and his.oncal standard devtanon for ,he selected acttvity. These 
.0 values are drawn from user and peer historical data file 102. 

At step 606 the features generator determines whether a user's activity for tha, 
day with respect ,„ the selected activity is normal or deviates from pas. behavior, In 
the described embodiment tins detention is made by calculating a nonnaltzed 
deviatton of the user's historical mean from the user's activity for tha, particular day. 
.5 Tha, is, how far off , he user's behavior is from ,he user's historic! mean, In the 
described embodiment, this is done by subtracting the user historical mean from the 
activity level and dividing the result by the user h.stoncal standard deviation. This 
calculation is recorded as a value in the range of -5 to 5 as described above. This 
value is ,hen stored in features lis, 106 a, step 608. A features lis, is desenbed in FIG. 
20 8 below. At step 610 the intrusion detection program defines whether there are 
any remaming acvifies in the activity ,i st f or the selected user , f ^ ^ ^ 
•hen returns to step 602 where another activity is selected and the process is repeated. 
If there are no more activities, the process of generating the user-specific portion of 
the features list is complete. Thus, a poriion of the features lis, which contains each 
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of a selected user's activities and a corresponding score indicating how close the 
user's actions are to previous behavior is completed. 

FIG. 7 is a flow diagram of a process for generating another portion of a 
features list related to a user's activity relative to peer activity in accordance with one 
5 embodiment of the present invention. The steps described here arc similar to those 
described in FIG. 6 except values used relate to peer data instead of user data. At step 
702 an activity is selected for a particular user. In the described embodiment, this 
step is the same as step 602. At step 704 the peer historical mean and peer historical 
standard deviation are retrieved from the user and peer historical data files 102. 

10 These values are computed at steps 508 and 510 of FIG. 5 using peer historical data. 
At step 706 the behavior corresponding to the selected activity by the user is 
compared to typical behavior of the user's peers for that activity. Any deviation by 
the user from normal peer activity is computed, i.e., any abnormal behavior is 
measured. This is done by subtracting the user's current activity value from the peer 

15 historical mean and dividing the result by the peer historical standard deviation. This 
deviation or anomalous behavior is translated into a numerical value and added to the 
features list 106 at step 708. As with deviation from the user's own behavior, in the 
described embodiment this value is measured as a real number in the range of -5 to 5. 
At step 710 the intrusion program determines whether there are anymore activities in 

20 the activity list . If there are, control returns to step 702. If not, the process is done 
and a complete features list has been created. 

FIG. 8 is a schematic diagram of a features list in accordance with one 
embodiment of the present invention. As described above features list 106 contains a 
series of values corresponding to a deviation of the user's behavior from the user's 
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pas. behavior and .he behavior of ,he user's peer group for various ac.ivi.ies. 
Fea,ures hs. ,06 co„,ai„s a series of vaiues, each vaiue correspondmg „ . panicular 
activity for a particular user. The fea,ure values for one user are grouped ,oge,her. b, 
•he described embodimem, fea,ures for each user are divided in,o ,wo sections. An 
> example of a firs, sec.ion of fea,ures 802 corresponds ,o values co mp an„g a user's 
behavior .o .he user's pas, behavior. Exa mp ,es of individual values are shown as 
values 804. A pmcess for generating these scores is descnbed in FIG. 6. The number 
of activnies hacked by ,hc intrusion detection program can vary. Examples of various 
ca.egories of ,hese activities are descnbed above. The types of activities monhored 
■0 by me inmtsion program can vaiy from ^ ^ ^ ^ ^ ^ ^ ^ ^ 

and type of security desired. 

A second section 806 corresponds ,o feature values derived from deviations of 
•he user's behavior from .he user's peer behavior for a particular activity. A process 
for generating these values is described in FIG. 7. fr, the described embod.men,, Ihe 
- number of activities in the two sections is the same. FoUowmg section 806 is another 
section similar ,o sec.ion 802 for another user. As previously explained in FIG. 2, 

be dehned in various ways, such as by priv.lege users as opposed ,o nonna, use,, by 
20 system, or level of activity. 

As described above, the present invention employs various computer- 
implemented operations invoMng data stored in computer systems. These operations 
include, bu, are no, Imti.ed to , those reouinng physical manipulation of physical 
quantities. Usua„y, mo „gh no, necessan.y, Utese q „a„,i„es ,a k e tire form of e,ec,nca, 
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or magnetic signals capable of being stored, transferred, combined, compared, and 
otherwise manipulated. The operations described herein that form part of the 
invention are useful machine operations. The manipulations performed are often 
referred to in terms, such as, producing, matching, identifying, running, determining, 
5 comparing, executing, downloading, or detecting. It is sometimes convenient, 
principally for reasons of common usage, to refer to these electrical or magnetic 
signals as bits, values, elements, variables, characters, data, or the like. It should 
remembered, however, that all of these and similar terms are to be associated with the 
appropriate physical quantities and are merely convenient labels applied to these 
10 quantities. 

The present invention also relates to a computer device, system or apparatus 
for performing the aforementioned operations. The system may be specially 
constructed for the required purposes, or it may be a general purpose computer, such 
as a server computer or a mainframe computer, selectively activated or configured by 
15 a computer program stored in the computer. The processes presented above are not 
inherently related to any particular computer or other computing apparatus. In 
particular, various general purpose computers may be used with programs written in 
accordance with the teachings herein, or, alternatively, it may be more convenient to 
construct a more specialized computer system to perform the required operations. 

20 FIG. 9 is a block diagram of a general purpose computer system 900 suitable 

for carrying out the processing in accordance with one embodiment of the present 
invention. FIG. 9 illustrates one embodiment of a general purpose computer system 
that, as mentioned above, can be a server computer, a client computer, or a mainframe 
computer. Other computer system architectures and configurations can be used for 
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carrying out the processing of the present invention. Computer system 900, made up 
of various subsystems described below, includes at least one microprocessor 
subsystem (also referred to as a central processing unit, or CPU) 902. That is, CPU 
902 can be implemented by a single-chip processor or by multiple processors. CPU 
902 is a general purpose digital processor which controls the operation of the 
computer system 900. Using instructions retneved from memory, the CPU 902 
controls the reception and manipulation of input data, and the output and display of 
data on output devices. 

CPU 902 is coupled bi-directionally with a first primary storage 904, typically 
a random access memory (RAM), and uni-directionally with a second primary storage 
area 906, typically a read-only memory (ROM), via a memory bus 908. As is well 
known in the art, primary storage 904 can be used as a general storage area and as 
scratch-pad memory, and can also be used to store input data and processed data, such 
as command and program name sequences. It can also store programming 
instructions and data, in the form of a message store in addition to other data and 
instructions for processes operating on CPU 902, and is used typically used for fast 
transfer of data and instructions in a bi-directional manner over the memory bus 908. 
Also as well known in the art, pnmary storage 906 typically includes basic operating 
instructions, program code, data, and objects used by the CPU 902 to perform its 
functions. Primary storage dev.ces 904 and 906 may include any suitable computer- 
readable storage media, described below, depending on whether, for example, data 
access needs to be bi-d.rectional or uni-directional. CPU 902 can also Erectly and 
very rap.dly retrieve and store frequently needed data in a cache memory 910. 
A removable mass storage device 912 provides additional data storage 
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capacity for the computer system 900, and is coupled either bi-directionally or uni- 
directionally to CPU 902 via a peripheral bus 914. For example, a specific removable 
mass storage device commonly known as a CD-ROM typically passes data uni- 
directionally to the CPU 902, whereas a floppy disk can pass data bi-directionally to 
5 the CPU 902. Storage 912 may also include computer-readable media such as 
magnetic tape, flash memory, signals embodied on a carrier wave, smart cards, 
portable mass storage devices, holographic storage devices, and other storage devices. 
A fixed mass storage 916 also provides additional data storage capacity and is 
coupled bi-directionally to CPU 902 via peripheral bus 914. The most common 

10 example of mass storage 916 is a hard disk drive. Generally, access to these media is 
slower than access to primary storages 904 and 906. Mass storage 912 and 916 
generally store additional programming instructions, data, and the like that typically 
are not in active use by the CPU 902. It will be appreciated that the information 
retained within mass storage 912 and 916 may be incorporated, if needed, in standard 

15 fashion as part of primary storage 904 (e.g. RAM) as virtual memory. 

In addition to providing CPU 902 access to storage subsystems, the peripheral 
bus 914 is used to provide access other subsystems and devices as well. In the 
described embodiment, these include a display monitor 918 and adapter 920, a printer 
device 922, a network interface 924, an auxiliary input/output device interface 926, a 
20 sound card 928 and speakers 930, and other subsystems as needed. 

The network interface 924 allows CPU 902 to be coupled to another 
computer, computer network, including the Internet or an intranet, or 
telecommunications network using a network connection as shown. Through the 
network interface 924, it is contemplated that the CPU 902 might receive information, 
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e.g., data objects or program instructions, from another network, or might output 
information to another network in the course of performing the above-descnbed 
method steps. Information, often represented as a sequence of instructions to be 
executed on a CPU, may be received from and outputted to another network, for 
5 example, in the form of a computer data signal embodied in a carrier wave. An 
interface card or similar device and appropriate software implemented by CPU 902 
can be used to connect the computer system 900 to an external network and transfer 
data according to standard protocols. That is, method embodnnents of the present 
invention may execute solely upon CPU 902, or may be performed across a network 
10 such as the Internet, intranet networks, or local area networks, in conjunction with a 
remote CPU that shares a portion of the processing. Additional mass storage devices 
(not shown) may also be connected to CPU 902 through network interface 924. 

Auxiliary I/O device interface 926 represents general and customed 
interfaces that allow the CPU 902 to send and, more typically, rece.ve data from other 
15 devices such as microphones, touch-sensitive displays, transducer card readers, tape 
readers, voice or handwriting recognizers, biometrics readers, cameras, portable mass 
storage devices, and other computers. 

Also coupled to the CPU 902 is a keyboard controller 932 via a local bus 934 
for receiving input from a keyboard 936 or a pointer device 938. and sending decoded 
20 symbols from the keyboard 936 or pointer device 938 to the CPU 902. The po.nter 
device may be a mouse, stylus, track ball, or tablet, and is useful for interacting with a 
graphical user interface. 

In addition, embodiments of the present invention further relate to computer 
storage products with a computer readable medium that contain program code for 
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performing various computer-implemented operations. The computer-readable 
medium is any data storage device that can store data that can thereafter be read by a 
computer system. The media and program code may be those specially designed and 
constructed for the purposes of the present invention, or they may be of the kind well 
5 known to those of ordinary' skill in the computer software arts. Examples of 

computer-readable media include, but are not limited to. all the media mentioned 
above: magnetic media such as hard disks, floppy disks, and magnetic tape; optical 
media such as CD-ROM disks; magneto-optical media such as floptical disks; and 
specially configured hardware devices such as application-specific integrated circuits 

10 (ASICs), programmable logic devices (PLDs), and ROM and RAM devices. The 
computer-readable medium can also be distributed as a data signal embodied in a 
carrier wave over a network of coupled computer systems so that the computer- 
readable code is stored and executed in a distributed fashion. Examples of program 
code include both machine code, as produced, for example, by a compiler, or files 

15 containing higher level code that may be executed using an interpreter. 

It will be appreciated by those skilled in the art that the above described 
hardware and software elements are of standard design and construction. Other 
computer systems suitable for use with the invention may include additional or fewer 
subsystems. In addition, memory bus 908, peripheral bus 914, and local bus 934 are 
20 illustrative of any interconnection scheme serving to link the subsystems. For 

example, a local bus could be used to connect the CPU to fixed mass storage 916 and 
display adapter 920. The computer system shown in FIG. 9 is but an example of a 
computer system suitable for use with the invention. Other computer architectures 
having different configurations of subsystems may also be utilized. 
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Although the foregoing invention has been described in some detail for 
purposes of clarity of understanding, it will be apparent that certain changes and 
modifications may be practiced within the scope of the appended claims. 
Furthermore, it should be noted that there are alternative ways of implementing both 
S the process and apparatus of the present invention. For example, the number and 
types of features used can vary depending on the security needs of the computer 
network. In another example, the methods and systems described can run in operating 
systems other than MVS, such as the Windows NT™ operatmg syslem Qj . & ^ ^ 
operating system. In yet another example, formulas or algorithms can be used to 
10 calculate the described mean and standard deviation values other than the ones 

described. In addition, the network intrusion detection system can be used in other 
applications, such as in medical applications. Accordingly, the present embodiments 
are to be considered as illustrative and not restrictive, and the invention is not to be 
limited to the details given herein, but may be modified within the scope and 
1 5 equivalents of the appended claims. 



BNSOOCID: <WO__0131420A2J. 



25 



WO 01/31420 



PCTYUS00/29490 



Claims 



What is claimed is: 

1 . A method of detecting an intrusion into a computer system, the method 
comprising: 

5 gathering user activity data corresponding to activities performed by an 

individual user; 

calculating historical values based on activities performed by users on the 
computer system; 

calculating a feature using the historical values and the user activity data; and 
10 utilizing the feature in a model to obtain a value indicating the likelihood of an 

intrusion whereby the historical values are adjusted according to shifts in nomial 
behavior of users thereby enabling calculation of the feature to reflect changing 
characteristics of behavior of the users on the computer system. 

15 2. A method as recited in claim 1 wherein gathering user activity data further 
includes: accessing user log files organized according to user and time. 

3. A method as recited in claim 1 wherein gathering user activity data further 
includes: 

20 retrieving user activity data corresponding to a predetermined time period. 

4. A method as recited in claim 1 further comprising gathering peer historical 
data including cumulative data of activities performed by a peer group. 

25 5. A method as recited in claim 1 wherein calculating historical values further 
includes: 

calculating a user historical mean and a user historical standard deviation for a 
selected user. 

30 6. A method as recited in claim 1 wherein calculating historical values further 
includes accessing the user activity data at predetermined time intervals. 
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7- A method as recited in claim 5 further comprising: calculating a peer historical 
mean and a peer historical standard deviation. 

5 8. A method as recited in claim 5 wherein calculating a user histoncal mean and 
a user historical standard deviation further includes: 

examining activities performed by the individual user. 

9. A method as recited in claim 5 further comprising counting the number of 
10 times an activity is performed by the individual user. 

10. A method as recited in claim 1 wherein calculating a feature further includes: 



and 



retrieving the user historical mean and the user histoncal standard deviation- 



computing a first deviation of behavior of the selected user from the 
historical mean. 



20 and 



11- A method as recited in claim 1 0 wherein calculating a feature further includes- 
retneving the peer historical mean and the peer historical standard deviation; 



computing a second deviation of behavior of the selected user from the peer 
historical mean. 



12. A method as recited in claim 1 0 wherein the user historical mean for a 
* particular activity is calculated based on a time-weighted user historical standard 
deviation. 



13. A method as recited in claim 1 1 wherein the peer historical mean for a 
pamcular activity is calculated based on a time-weighted peer historical standard 
30 deviation. 



14. A method as recited in claim 5 further including calculating a nomtalized user 
devtatton from nonnal behavior of the individual user using me user activity data. 

27 



BNSDOCIO: <WO_0131420A2_L> 



WO 01/31420 



PCT/US00/29490 



15. A method of generating a feature to be used in a model, the method 
comprising: 

collecting user-specific activity data for a plurality of activities; 
5 generating user-specific historical data for a particular activity utilizing the 

user-specific activity data; 

generating peer historical data for the particular activity; 
utilizing the user-specific historical data and the peer historical data to 
generate a feature associated with the particular activity wherein the feature reflects 
10 current behavior and past behavior of a particular user and of a group of users on a 
computer system with respect to the particular activity. 

16. A method as recited in claim 15 wherein utilizing the user-specific historical 
data and the peer historical data to generate a feature further comprises: 

15 computing a user deviation from normal behavior of the particular user for the 

particular activity. 

1 7. A method as recited in claim 1 5 wherein utilizing the user-specific historical 
data and the peer historical data to generate a feature further comprises: 

20 computing a peer deviation from normal behavior of the particular user for the 

particular activity. 

18. A method as recited in claim 15 wherein generating user-specific historical 
data for a particular activity utilizing the user-specific activity data further comprises: 

25 determining a first count of the number of times the particular activity was 

performed by the user in a predetermined time period; 

updating a previous user historical mean value associated with the particular 
activity using the first count thereby deriving a current user historical mean value; and 
updating a previous user historical standard deviation value associated with 
30 the particular activity using the first count thereby deriving a current user historical 
standard deviation value. 

19. A method as recited in claim 15 wherein determining a first count further 
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20. A method as reci.ed inch,™ ,9 wherein the uaer-specific activity data 
tnciudes a use, identifier, an activity descriptor, and an activity timestatnp. 

21. A method as recited ,„ claim , 5 wherein generating pccr historjca , da|a for 
particular activity further includes 

determining a second coun, of the number of rimes the parttcu.ar activitv was 
perfomred by me group of users in a predetermined time period; 

updating a prevous peer histoncal mean value assoeiated with rhe particular 
aettvtty using the second coun, thereby denving a current peer historieai mean value; 

updating a prevl ous peer histonca, standard deviation value assodated with 
. e particula, activt.y uatng me second coun, thereby deriving a current peer histonca, 
15 standard deviation value. 



10 



20 



25 



22. A computer network intrusion detection system comprising- 

a user activity data file containing user-specific data related to activities 
performed by a particular user; 

a histoncal data file containing statistical data related to past behavior of a 
user and of a peer group; and 

a fearures generator accepting as input the user-specific data and the s.atistica, 
data related «o pas, behavior of a user and of a peer group wherein ,he fea.ures 
generator calculates a feature baaed on cnrren, and pas, behavior of ,he user and 
curren, and past behavior of the peer group. 



2 



- A network intrusion detection system as reCed in claim 22 further 
comprising: 

>0 and , 3 r 0 ^' '° ^ " mPU ' * fea ' Ure md by ' he *-» *— 

and to output a score indicating the liaehhood tha, a panicu,ar activity is an mtmsion. 

24. A networtt intrusion detection system aa recited in ctitim 22 further 
comprising: 
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a features list having a plurality of segments, a segment corresponding to a 
user and containing a plurality of values corresponding to activities performed on the 
system. 

5 25. A network intrusion detection system as recited in claim 24 wherein a segment 
in the features list includes a first section storing a plurality of user-related values and 
a second section storing a plurality of peer-related values. 

26. A network intrusion detection system as recited in claim 22 wherein the user 
10 activity data file further includes a user identifier, an activity description, and a 

timestamp. 

27. A network intrusion detection system as recited in claim 22 wherein the 
historical data file further includes a user historical mean and a peer historical mean. 

15 

28. A network intrusion detection system as recited in claim 22 wherein the 
historical data file further includes a user historical standard deviation and a peer 
historical standard deviation. 

20 29. A computer-readable medium containing programmed instructions arranged to 
detect an intrusion into a computer system, the computer-readable medium including 
programmed instructions for: 

gathering user activity data corresponding to activities performed by an 
individual user; 

25 calculating historical values based on activities performed by users on the 

computer system; 

calculating a feature using the historical values and the user activity data; and 
utilizing the feature in a model to obtain a value indicating the likelihood of an 
intrusion whereby the historical values are adjusted according to shifts in normal 
30 behavior of users thereby enabling calculation of the feature to reflect changing 
characteristics of behavior of the users on the computer system. 

30. A computer-readable medium as recited in claim 29 further comprising 
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31. A computer-readable medium containing programmed ins.mc.ions arranged to 
generate a feature .o be used in a model, the computer-readable med.um including 
programmed instructions for: 

collecting user-specific activity data for a plurality of activities; 
generating user-specific histoncal data for a particular activity utilizing the 
user-specific activity data; 

generating peer historical data for the particular activity; 

utilizing the user-specific historical data and the peer historical data to 
generate a feature associated with the particular activity wherein the feature reflects 
current beha V1 or and past behavior of a particular user and of a group of users on a 
computer system with respect to the particular activity. 
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(57) Abstract: Delecting harmful or illegal intrusions 
into a computer network or into restricted portions 
of a computer network uses a features generator or 
builder to generate a feature reflecting changes in 
user and user group behavior over lime. User and 
user group historical means and standard deviations 
are used to generate a feature that is not dependent on 
rigid or static rule sets. These statistical and historical 
values are calculated by accessing user activity data 
listing activities performed by users on the computer 
system. Historical information is then calculated 
based on the activities performed by users on the 
computer system. The feature is calculated using 
the historical information based on the user or group 
of users activities. The feature is then utilized by a 
model to obtain a value or score which indicates the 
likelihood of an intrusion into the computer network. 
The historical values are adjusted according to shifts 
in normal behavior of users of the computer system. 
This allows for calculation of the feature to reflect 
changing characteristics of the users on the computer 
system. 
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